Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: warehouse transformer #5205

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

achettyiitr
Copy link
Member

@achettyiitr achettyiitr commented Oct 17, 2024

Description

  • Warehouse transformation as a module.
  • It's basically ported from the warehouse module of the rudder-transformer from Javascript to GO.
  • More about it can be found in this doc.
  • Currently, there is a flag to enable warehouse transformations along with Processor using the flag Processor.enableWarehouseTransformations.
    • Once enabled, we make an API call to the warehouse transformer package.
    • Compare the response, in case we see any differences, we log it for debugging purposes.

Linear Ticket

  • Resolves WAR-187

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch from a941966 to 22b287b Compare October 17, 2024 07:49
@achettyiitr achettyiitr requested a review from lvrach October 17, 2024 07:49
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch from 22b287b to 5ae4838 Compare October 17, 2024 07:51
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch 3 times, most recently from 1a41ba4 to e414cca Compare October 18, 2024 04:45
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 77.80963% with 387 lines in your changes missing coverage. Please review.

Project coverage is 74.94%. Comparing base (e472c28) to head (964c7c0).

Files with missing lines Patch % Lines
warehouse/transformer/events.go 56.98% 154 Missing and 74 partials ⚠️
warehouse/transformer/set.go 63.86% 31 Missing and 12 partials ⚠️
warehouse/transformer/logger.go 63.07% 17 Missing and 7 partials ⚠️
warehouse/transformer/testhelper/testhelper.go 20.83% 19 Missing ⚠️
warehouse/transformer/safe.go 92.30% 11 Missing and 4 partials ⚠️
warehouse/transformer/idresolution.go 90.97% 8 Missing and 4 partials ⚠️
warehouse/transformer/testhelper/outputbuilder.go 75.00% 8 Missing and 4 partials ⚠️
warehouse/transformer/transformer.go 93.95% 8 Missing and 1 partial ⚠️
...rmer/internal/reservedkeywords/reservedkeywords.go 73.91% 4 Missing and 2 partials ⚠️
warehouse/transformer/datatype.go 94.28% 4 Missing ⚠️
... and 5 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5205      +/-   ##
==========================================
+ Coverage   74.78%   74.94%   +0.15%     
==========================================
  Files         440      458      +18     
  Lines       61507    63248    +1741     
==========================================
+ Hits        46000    47403    +1403     
- Misses      12966    13196     +230     
- Partials     2541     2649     +108     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch 4 times, most recently from 404773f to c798625 Compare October 21, 2024 01:40
@achettyiitr achettyiitr requested a review from lvrach October 21, 2024 01:42
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch 2 times, most recently from 9f36c96 to 3f956b4 Compare October 21, 2024 05:47
processor/processor.go Outdated Show resolved Hide resolved
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch 2 times, most recently from 11996b9 to 14aa043 Compare October 21, 2024 09:54
@achettyiitr achettyiitr requested a review from ktgowtham October 21, 2024 09:54
@github-actions github-actions bot added the Stale label Nov 11, 2024
@github-actions github-actions bot closed this Nov 18, 2024
@achettyiitr achettyiitr reopened this Dec 18, 2024
@github-actions github-actions bot removed the Stale label Dec 19, 2024
@github-actions github-actions bot added the Stale label Jan 9, 2025
@rudderlabs rudderlabs deleted a comment from github-actions bot Jan 14, 2025
@rudderlabs rudderlabs deleted a comment from github-actions bot Jan 14, 2025
@achettyiitr achettyiitr requested review from lvrach and ktgowtham and removed request for ktgowtham and lvrach January 15, 2025 08:14
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch 9 times, most recently from 6934017 to dcc7800 Compare January 20, 2025 07:52
@achettyiitr achettyiitr marked this pull request as ready for review January 20, 2025 08:33
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch from dcc7800 to a527f87 Compare January 20, 2025 08:56
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch from a527f87 to 546af2a Compare January 20, 2025 09:03
processor/processor.go Outdated Show resolved Hide resolved
processor/processor.go Outdated Show resolved Hide resolved
processor/processor.go Outdated Show resolved Hide resolved
processor/processor.go Outdated Show resolved Hide resolved
processor/processor.go Outdated Show resolved Hide resolved
processor/transformer/transformer.go Show resolved Hide resolved
warehouse/transformer/datatype.go Show resolved Hide resolved
Comment on lines +100 to +106
"data": data,
"metadata": map[string]any{
"table": table,
"columns": columns,
"receivedAt": tec.event.Metadata.ReceivedAt,
},
"userId": "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map is being created across multiple functions. Should we consider introducing stricter typing by replacing it with a struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can create a struct with the following information data, metadata and userID. But then later again it will be used to populate the map which is basically TransformerResponse.Output. I don't see much value in here with defining the struct.

whutils "github.com/rudderlabs/rudder-server/warehouse/utils"
)

func (t *Transformer) trackEvents(tec *transformEventContext) ([]map[string]any, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should define these methods on the transformEventContext struct instead of passing tec as an argument in all the functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • transformEventContext is a data model. I don't think we should have these methods set on a data model.
  • Also, if we do that, then we would need to pass all other information like config and logger as well if we want to get some configuration or log things.

warehouse/transformer/idresolution.go Show resolved Hide resolved
warehouse/transformer/internal/rules/rules.go Show resolved Hide resolved
warehouse/transformer/jsonpaths.go Outdated Show resolved Hide resolved
warehouse/transformer/safe.go Show resolved Hide resolved
if err != nil {
return "", err
}
tec.cache.safeTableNameCache.Store(cacheKey, tableName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of creating a cache ? Does the "transform" function involve heavy computation?

Copy link
Member Author

@achettyiitr achettyiitr Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's not heavy computation, but it still involves a couple of string manipulations, which might be crucial if the string is large. For a particular key, since these don't change, it should be fine to have a cache here. If we get a batch of 1000 events, each with 100 keys and 100K properties, these small improvements will greatly help.

// transformColumnName applies transformation to the input column name based on the destination type and configuration options.
// If `useBlendoCasing` is enabled, it transforms the column name into Blendo casing.
// Otherwise, it applies a more general transformation using the `transformName` function.
func transformColumnName(destType string, intrOpts *intrOptions, destOpts *destOptions, columnName string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformColumnName shares almost all the the code with the transformTableName function. Should we extract the common code ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too much of an abstraction to get this common code to a separate function.

warehouse/transformer/safe.go Outdated Show resolved Hide resolved
warehouse/transformer/set.go Show resolved Hide resolved
warehouse/transformer/set.go Show resolved Hide resolved
Comment on lines +113 to +115
require.NotEmpty(t, wResponse.FailedEvents[i].Error)

require.NotZero(t, pResponse.FailedEvents[i].StatusCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also check that Error and StatusCode are same in pResponse and wResponse ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the error string might be different. We can't actually compare the error string, Javascript handling of error is different from GO handling. So we are just comparing if we received failed events or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. What about the status code ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of exceptions in Javascript, we are just returning the internal error, but we are handling those errors appropriately in GO code.
It should be fine as we are mostly concerned with the Output.

warehouse/transformer/testhelper/validate.go Show resolved Hide resolved
warehouse/transformer/types.go Show resolved Hide resolved
warehouse/transformer/transformer.go Outdated Show resolved Hide resolved
warehouse/transformer/transformer.go Outdated Show resolved Hide resolved
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch 2 times, most recently from 1654eca to 45dcaa9 Compare January 31, 2025 05:04
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch from 45dcaa9 to 964c7c0 Compare January 31, 2025 06:33
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch from 964c7c0 to b8089e9 Compare February 3, 2025 03:39
@achettyiitr achettyiitr force-pushed the feat.warehouse-transformer branch from b8089e9 to 8897f5d Compare February 3, 2025 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants